Refactor turbomind attention by precomputing cos/sin #2801

irexyc · 2024-11-25T03:34:58Z

Motivation

Calculate cos/sin in advance and reduce the parameters of the prefill/decode kernel

lvhan028 · 2024-11-27T09:45:53Z

src/turbomind/models/llama/unified_decoder.cc

@@ -81,6 +83,11 @@ void UnifiedDecoder<T>::forwardSelfAttn(T*                             attn_io,
    inputs.insert("h_cu_q_len", {MEMORY_CPU, TYPE_INT32, {batch_size + 1}, h_cu_q_len_});
    inputs.insert("h_cu_k_len", {MEMORY_CPU, TYPE_INT32, {batch_size + 1}, h_cu_k_len_});

+    if (rotary_emb_) {


any case that rotary_emb_ a nullptr?

lvhan028 · 2024-12-02T11:26:02Z

src/turbomind/models/llama/llama_params.h

@@ -59,22 +59,45 @@ struct MoeParam {
    std::vector<int> expert_num;
 };

+enum class RotaryScalingType


RotaryScalingType -> RopeType

lvhan028 · 2024-12-02T11:53:28Z

src/turbomind/models/llama/rotary_emb.h

+
+struct InnerYarnRopeParam {
+    float attention_factor;
+    float yarn_ramp_inv_factor_div_2;


I think we can remove the prefix "yarn_"

lvhan028 · 2024-12-02T11:53:50Z

src/turbomind/models/llama/rotary_emb.h

+};
+
+struct InnerLlama3RopeParam {
+    float llama3_inv_scaling_factor;


the prefix "llama3_" can be removed

irexyc added 3 commits November 25, 2024 02:41

use precomputed cos sin

05d011c

remove unused

7b74b72

Merge remote-tracking branch 'origin/main' into rope

589cacb

lvhan028 added the improvement label Nov 25, 2024

lvhan028 changed the title ~~Use precomputed cos/sin~~ Refactor turbomind attention by precomputing cos/sin Nov 27, 2024

lvhan028 reviewed Nov 27, 2024

View reviewed changes

irexyc added 2 commits December 2, 2024 06:00

Merge remote-tracking branch 'origin/main' into rope

45f0968

split rope params

0e4c315

lvhan028 reviewed Dec 2, 2024

View reviewed changes

irexyc added 2 commits December 2, 2024 12:07

remove prefix yarn_, llama3_

ea6112e

fix test_attention

0513e12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor turbomind attention by precomputing cos/sin #2801

Refactor turbomind attention by precomputing cos/sin #2801

irexyc commented Nov 25, 2024

lvhan028 Nov 27, 2024

lvhan028 Dec 2, 2024

lvhan028 Dec 2, 2024

lvhan028 Dec 2, 2024

Refactor turbomind attention by precomputing cos/sin #2801

Are you sure you want to change the base?

Refactor turbomind attention by precomputing cos/sin #2801

Conversation

irexyc commented Nov 25, 2024

Motivation

lvhan028 Nov 27, 2024

Choose a reason for hiding this comment

lvhan028 Dec 2, 2024

Choose a reason for hiding this comment

lvhan028 Dec 2, 2024

Choose a reason for hiding this comment

lvhan028 Dec 2, 2024

Choose a reason for hiding this comment